智能论文笔记

KDD CUP 2022 Wind Power Forecasting Team 88VIP Solution

Fangquan Lin , Wei Jiang , Hanwei Zhang , Cheng Yang

分类：机器学习 | 人工智能

2022-08-18

KDD CUP 2022提出了有关空间动态风能数据集的时间序列预测任务，其中要求参与者预测未来一代给定历史上下文因素。评估指标包含RMSE和MAE。本文介绍了团队88VIP的解决方案，该解决方案主要包括两种模型：梯度增强决策树，以记住基本数据模式和复发性神经网络，以捕获深层和潜在的概率过渡。结合这些模型有助于应对风力的波动，以及训练子模型对预测的异质时间尺度（从几分钟到几天）的杰出特性的目标。此外，还详细介绍了功能工程，插补技术和离线评估的设计。拟议的解决方案在第3阶段的总体在线得分为-45.213。

translated by 谷歌翻译

Bayesian Optimization with Clustering and Rollback for CNN Auto Pruning

Hanwei Fan , Jiandong Mu , Wei Zhang

分类：机器学习 | 人工智能

2021-09-22

修剪是卷积神经网络（CNNS）模型压缩的有效技术，但是由于较大的设计空间，很难找到最佳的修剪政策。为了提高修剪的可用性，已经开发了许多自动修剪方法。最近，由于其坚实的理论基础和高采样效率，贝叶斯优化（BO）被认为是一种自动修剪的竞争算法。但是，BO受到维度的诅咒。由于设计空间的尺寸增加，在修剪深CNN时，BO的性能会恶化。我们提出了一种新颖的聚类算法，该算法降低了设计空间的尺寸以加快搜索过程。随后，提出了回滚算法以恢复高维设计空间，以便获得更高的修剪精度。我们验证了有关Resnet，MobilenetV1和MobilenetV2模型的建议方法。实验表明，提出的方法在修剪深CNN而不会增加运行时间时显着提高BO的收敛速率。源代码可在https://github.com/fanhanwei/bocr上获得。

translated by 谷歌翻译

From Distance to Dependency: A Paradigm Shift of Full-reference Image Quality Assessment

Hanwei Zhu , Baoliang Chen , Lingyu Zhu , Shiqi Wang

分类：计算机视觉

2022-11-09

Deep learning-based full-reference image quality assessment (FR-IQA) models typically rely on the feature distance between the reference and distorted images. However, the underlying assumption of these models that the distance in the deep feature domain could quantify the quality degradation does not scientifically align with the invariant texture perception, especially when the images are generated artificially by neural networks. In this paper, we bring a radical shift in inferring the quality with learned features and propose the Deep Image Dependency (DID) based FR-IQA model. The feature dependency facilitates the comparisons of deep learning features in a high-order manner with Brownian distance covariance, which is characterized by the joint distribution of the features from reference and test images, as well as their marginal distributions. This enables the quantification of the feature dependency against nonlinear transformation, which is far beyond the computation of the numerical errors in the feature space. Experiments on image quality prediction, texture image similarity, and geometric invariance validate the superior performance of our proposed measure.

translated by 谷歌翻译

Deep Feature Statistics Mapping for Generalized Screen Content Image Quality Assessment

Baoliang Chen , Hanwei Zhu , Lingyu Zhu , Shiqi Wang , Sam Kwong

分类：计算机视觉

2022-09-12

自然图像的统计规律（称为自然场景统计数据）在不引用图像质量评估中起重要作用。但是，人们普遍认为，通常是计算机生成的屏幕内容图像（SCI）不持有此类统计信息。在这里，我们首次尝试学习SCI的统计数据，基于可以有效确定SCI的质量。所提出的方法的基本机制是基于一个狂野的假设，即没有物理上获得的SCI仍然遵守某些可以以学习方式理解的统计数据。我们从经验上表明，在质量评估中可以有效利用统计偏差，并且在不同的环境中进行评估时，提出的方法优越。广泛的实验结果表明，与现有的NR-IQA模型相比，基于深度统计的SCI质量评估（DFSS-IQA）模型可提供有希望的性能，并在跨数据库设置中显示出很高的概括能力。我们的方法的实现可在https://github.com/baoliang93/dfss-iqa上公开获得。

translated by 谷歌翻译

DeepWSD: Projecting Degradations in Perceptual Space to Wasserstein Distance in Deep Feature Space

Xigran Liao , Baoliang Chen , Hanwei Zhu , Shiqi Wang , Mingliang Zhou , Sam Kwong

分类：计算机视觉

2022-08-05

现有的基于深度学习的全参考IQA（FR-IQA）模型通常通过明确比较特征，以确定性的方式预测图像质量，从而衡量图像严重扭曲的图像是多远，相应的功能与参考的空间相对远。图片。本文中，我们从不同的角度看这个问题，并提议从统计分布的角度对知觉空间中的质量降解进行建模。因此，根据深度特征域中的Wasserstein距离来测量质量。更具体地说，根据执行最终质量评分，测量了预训练VGG网络的每个阶段的1Dwasserstein距离。 Deep Wasserstein距离（DEEPWSD）在神经网络的功能上执行的，可以更好地解释由各种扭曲引起的质量污染，并提出了高级质量预测能力。广泛的实验和理论分析表明，在质量预测和优化方面，提出的DEEPWSD的优越性。

translated by 谷歌翻译

No-Reference Image Quality Assessment by Hallucinating Pristine Features

Baoliang Chen , Lingyu Zhu , Chenqi Kong , Hanwei Zhu , Shiqi Wang , Zhu Li

分类：计算机视觉

2021-08-09

在本文中，我们提出了通过特征级伪参考（PR）幻觉提出的无引用（NR）图像质量评估（IQA）方法。提出的质量评估框架基于自然图像统计行为的先前模型，并植根于以下观点，即可以很好地利用具有感知意义的特征来表征视觉质量。本文中，通过以原始参考为监督的相互学习方案学习了扭曲的图像中的PR特征，并通过三重态约束进一步确保PR特征的区分特性。给定质量推断的扭曲图像，特征水平的分离是用可逆神经层进行最终质量预测的，导致PR和相应的失真特征以进行比较。在四个流行的IQA数据库中证明了我们提出的方法的有效性，跨数据库评估的卓越性能也揭示了我们方法的高概括能力。我们的方法的实现可在https://github.com/baoliang93/fpr上公开获得。

translated by 谷歌翻译

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

Sucheng Ren , Fangyun Wei , Zheng Zhang , Han Hu

分类：计算机视觉

2023-01-03

Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.

translated by 谷歌翻译

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

Junjie Yan , Yingfei Liu , Jianjian Sun , Fan Jia , Shuailin Li , Tiancai Wang , Xiangyu Zhang

分类：计算机视觉

2023-01-03

In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.

translated by 谷歌翻译

Backdoor Attacks Against Dataset Distillation

Yugeng Liu , Zheng Li , Michael Backes , Yun Shen , Yang Zhang

分类：机器学习

2023-01-03

Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.

translated by 谷歌翻译

PMT-IQA: Progressive Multi-task Learning for Blind Image Quality Assessment

Qingyi Pan , Ning Guo , Letu Qingge , Jingyi Zhang , Pei Yang

分类：计算机视觉

2023-01-03

Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.

translated by 谷歌翻译